Union County
Large Language Models Still Face Challenges in Multi-Hop Reasoning with External Knowledge
We carry out a series of experiments to test large language models' multi-hop reasoning ability from three aspects: selecting and combining external knowledge, dealing with non-sequential reasoning tasks and generalising to data samples with larger numbers of hops. We test the GPT-3.5 model on four reasoning benchmarks with Chain-of-Thought prompting (and its variations). Our results reveal that despite the amazing performance achieved by large language models on various reasoning tasks, models still suffer from severe drawbacks which shows a large gap with humans.
2412.08317
Country:
- Asia > China (0.14)
- North America > United States > California > Los Angeles County (0.14)
- Asia > South Korea (0.05)
- (18 more...)
Industry:
- Media > Film (1.00)
- Leisure & Entertainment > Sports > Basketball (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- (4 more...)
Technology: